Comparative Quality Estimation: Automatic Sentence-Level Ranking of Multiple Machine Translation Outputs

نویسنده

  • Eleftherios Avramidis
چکیده

A machine learning mechanism is learned from human annotations in order to perform preference ranking. The mechanism operates on a sentence level and ranks the alternative machine translations of each source sentence. Rankings are decomposed into pairwise comparisons so that binary classifiers can be trained using black-box features of automatic linguistic analysis. In order to re-compose the pairwise decisions of the classifier, this work introduces weighing the decisions with their classification probabilities, which eliminates ranking ties and increases the coefficient of the correlation with the human rankings up to 80%. The authors also demonstrate several configurations of successful automatic ranking models; the best configuration achieves acceptable correlation with human judgments (tau=0.30), which is higher than that of state-of-the-art reference-aware automatic MT evaluation metrics such as METEOR and Levenshtein distance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine learning methods for comparative and time-oriented Quality Estimation of Machine Translation output

This paper describes a set of experiments on two sub-tasks of Quality Estimation of Machine Translation (MT) output. Sentence-level ranking of alternative MT outputs is done with pairwise classifiers using Logistic Regression with blackbox features originating from PCFG Parsing, language models and various counts. Post-editing time prediction uses regression models, additionally fed with new el...

متن کامل

Quality Estimation Of Machine Translation Outputs Through Stemming

Machine Translation is the challenging problem for Indian languages. Every day we can see some machine translators being developed , but getting a high quality automatic translation is still a very distant dream . The correct translated sentence for Hindi language is rarely found. In this paper, we are emphasizing on English-Hindi language pair, so in order to preserve the correct MT output we ...

متن کامل

Selecting Feature Sets for Comparative and Time-Oriented Quality Estimation of Machine Translation Output

This paper describes a set of experiments on two sub-tasks of Quality Estimation of Machine Translation (MT) output. Sentence-level ranking of alternative MT outputs is done with pairwise classifiers using Logistic Regression with blackbox features originating from PCFG Parsing, language models and various counts. Post-editing time prediction uses regression models, additionally fed with new el...

متن کامل

Automatic Post-Editing based on SMT and its selective application by Sentence-Level Automatic Quality Evaluation

In the computing assisted translation process with machine translation (MT), postediting costs time and efforts on the part of human. To solve this problem, some have attempted to automate post editing. Post-editing isn’t always necessary, however, when MT outputs are of adequate quality for human. This means that we need to be able to estimate the translation quality of each translated sentenc...

متن کامل

Analysing Quality of English-hindi Machine Translation Engine Outputs Using Baysian Classification

This paper considers the problem for estimating the quality of machine translation outputs which are independent of human intervention and are generally addressed using machine learning techniques.There are various measures through which a machine learns translations quality. Automatic Evaluation metrics produce good co-relation at corpus level but cannot produce the same results at the same se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012